Shallow Language Processing Architecture for Bulgarian

نویسندگان

  • Hristo Tanev
  • Ruslan Mitkov
چکیده

This paper describes LINGUA an architecture for text processing in Bulgarian. First, the pre-processing modules for tokenisation, sentence splitting, paragraph segmentation, partof-speech tagging, clause chunking and noun phrase extraction are outlined. Next, the paper proceeds to describe in more detail the anaphora resolution module. Evaluation results are reported for each processing task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach for Deep Machine Translation

This paper presents a Hybrid Approach to Deep Machine Translation in the language direction from English to Bulgarian. The set-up uses preand post-processing modules as well as two-level transfer. The language resources that have been incorporated are: WordNets for both languages; a valency lexicon for Bulgarian; aligned parallel corpora. The architecture comprises a predominantly statistical c...

متن کامل

Verb Valency Descriptors for a Syntactic Treebank

An essential component of Language Engineering (LE) tools are verb class descriptors that provide information about the relations of the predicates to their arguments. The production of computationally tractable language resources necessitates the assignment of types of predicate-argument relations to a great variety of verb-centered structures: it is necessary to define not only the initial, c...

متن کامل

An XML Architecture for Shallow and Deep Processing

The paper presents a set of XML tools for natural language processing such as regular grammars, constraints, transformations, remove and insert operations. The architecture allows any combinations of the tools depending on the task and the concrete analysis. The main control mechanism is the backtracking which depends on achieving a particular subgoal in the analysis. The main advantage of the ...

متن کامل

Integrating deep and shallow natural language processing components: representations and hybrid architectures

We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processin...

متن کامل

Multilingual summarization system based on analyzing the discourse structure at MultiLing 2013

This paper describes the architecture of UAIC 1 ’s Summarization system participating at MultiLing – 2013. The architecture includes language independent text processing modules, but also modules that are adapted for one language or another. In our experiments, the languages under consideration are Bulgarian, German, Greek, English, and Romanian. Our method exploits the cohesion and coherence p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002